Fixed removal of hosts from certsmap when running certificate auto-renew#4156
Fixed removal of hosts from certsmap when running certificate auto-renew#4156yadvr merged 1 commit intoapache:masterfrom
Conversation
|
@blueorangutan package |
|
@Spaceman1984 a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
|
Packaging result: ✔centos7 ✔debian. JID-1421 |
|
@blueorangutan test |
|
@Spaceman1984 a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
|
@Spaceman1984 can you kick packaging and tests again; was purgeHost being called on a host already removed in DB? But how that does affect hosts that are already connected to a management server (i.e. host not removed in DB)? |
|
@blueorangutan package |
|
@Spaceman1984 a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
|
@rhtyd I didn't see purgeHost being called on a host already removed in my testing. What I observed, was the host being ignored because a check was done on an empty management server field. |
|
Packaging result: ✔centos7 ✔debian. JID-1443 |
|
@blueorangutan test |
|
@Spaceman1984 a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
nvazquez
left a comment
There was a problem hiding this comment.
@Spaceman1984 this looks good but I still think it does not solve the issue completely: when the certificates are being removed from the active certificates map (in memory) on agent disconnection then those certificates will be skipped from the certificates loop in the CA background task
|
@nvazquez there is no problem with the certs map losing certs when hosts disconnect because the cert map is repopulated when a host connects again. If you provision keys from a different management server, the host will disconnect but can reconnect to any available management server in it's list. If you test with 2 management servers and attach your debugger to both at the same time, you will see the cert map being populated on the management server the host is connecting to. |
|
Auto-renewal can only happen if the host has a valid certificate. As soon as the certificate becomes invalid, the host can't communicate with the management server anymore and therefore wouldn't be able to get a new certificate. So before the validity period of the certificate would run out and the host is not able to communicate with the management server, auto-renewal must happen. - The auto-renewal process was failing and therefore certificates were not being renewed. If auto-renewal is fixed, then there wouldn't be a problem with disconnected hosts. |
|
Thanks @Spaceman1984, tested manually and looks good. @rhtyd I noticed the internal active certificates map uses the host IP as the key, which is then used for querying in the DB for that IP. Is there any reason for not using the internal host ID or the UUID as the map key and using the host IP instead? |
|
@blueorangutan package |
|
@nvazquez a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
|
Packaging result: ✔centos7 ✔debian. JID-1492 |
|
@blueorangutan test |
|
@nvazquez a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
|
@nvazquez yes it's because the code which adds the X509 cert to the hashmap (currently) uses the incoming client's IP address as the key. It may need checking or use host UUID/ID instead of IP address if that corrects the implementation. |
|
ping @nvazquez please review |
nvazquez
left a comment
There was a problem hiding this comment.
LGTM based on manual testing and code review
yadvr
left a comment
There was a problem hiding this comment.
LGTM, but if the certMap is empty that issue may not be fixed by this PR.
|
@blueorangutan package |
|
@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
|
Packaging result: ✔centos7 ✔debian. JID-1554 |
|
@blueorangutan test |
|
@rhtyd a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
|
Trillian test result (tid-2034)
|
…to-renewal (apache#4156) When a host connects to a management server, the host IP address and the certificate are stored in memory on the management server. This mapping is checked periodically to determine if any certificates are due to expire. Before a certificate is renewed, a few checks are done to determine if the host is connected to the management server by fetching the host record from the database. The problem here is if the wrong record is fetched, the host is not checked for renewal. This PR improves the host record fetch from the database by looking only at hosts that are not removed. Fixes: apache#4129
…to-renewal (apache#4156) When a host connects to a management server, the host IP address and the certificate are stored in memory on the management server. This mapping is checked periodically to determine if any certificates are due to expire. Before a certificate is renewed, a few checks are done to determine if the host is connected to the management server by fetching the host record from the database. The problem here is if the wrong record is fetched, the host is not checked for renewal. This PR improves the host record fetch from the database by looking only at hosts that are not removed. Fixes: apache#4129
Description
When a host connects to a management server, the host IP address and the certificate are stored in memory on the management server. This mapping is checked periodically to determine if any certificates are due to expire.
Before a certificate is renewed, a few checks are done to determine if the host is connected to the management server by fetching the host record from the database. The problem here is if the wrong record is fetched, the host is not checked for renewal.
This PR improves the host record fetch from the database by looking only at hosts that are not removed.
Fixes: #4129
Types of changes
Screenshots (if appropriate):
How Has This Been Tested?
This has been tested by setting the ca.framework.cert.validity.period and ca.framework.cert.expiry.alert.period to the same value. This is to ensure that a certificate is up for renewal as soon as it is issued.
Then watch the management server logs to see if auto-renewal happens.
This has also been tested by using two management servers and reprovisioning host security keys from the second management server and still having the certs auto-renew.